Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New name for hint::black_box: pretend_used #74853

Closed
wants to merge 2 commits into from

Conversation

jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Jul 28, 2020

EDIT (2020-07-29): Changed from assume_used to pretend_used following #74853 (comment)

The current name is a legacy from before RFC 2360. The RFC calls the
method bench_black_box, though acknowledges that this name is not be
ideal either.

The main objection to the name during and since the RFC is that it is
not truly a black box to the compiler. Instead, the hint only encourages
the compiler to consider the passed-in value used. This in the hope that
the compiler will materialize that value somewhere, such as in memory or
in a register, and not eliminate the input as dead code.

This PR proposes to rename the method to pretend_used. This clearly
indicates the precise semantics the hint conveys to the compiler,
without suggesting that it disables further compiler optimizations. The
name also reads straightforwardly in code: hint::pretend_used(x) hints
to the compiler that it should pretend that x is used in some
arbitrary way.

pretend_used also rectifies the secondary naming concern the RFC
raised with the names black_box and bench_black_box; the potential
confusion with "boxed" values.

If this change lands, it completes the naming portion of #64102.

@rust-highfive
Copy link
Collaborator

r? @shepmaster

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 28, 2020
@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 28, 2020

I'd like to make it clear that I opened this PR primarily as a way to direct conversation from #64102 (comment) for a concrete change. If folks don't like the proposed name, I am happy to close this and continue discussion in #64102.

@bors
Copy link
Contributor

bors commented Jul 28, 2020

☔ The latest upstream changes (presumably #73265) made this pull request unmergeable. Please resolve the merge conflicts.

@bjorn3
Copy link
Member

bjorn3 commented Jul 28, 2020

assume_used feels to me like the compiler is allowed to assume that the input and output are identical, allowing const folding of the user of the output value, while only forbiding the removal of the creator of the input value.

@LukasKalbertodt
Copy link
Member

I like the assume_ part, but agree with @bjorn3 in that assume_used seems only fitting for the "eat the result of my calculation" use case, but not for the "hide this constant from the compiler" use case.

@petertodd
Copy link
Contributor

Wikipedia describes black boxes as having an opaque implementation, so one option might be simply opaque()

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 28, 2020

assume_used feels to me like the compiler is allowed to assume that the input and output are identical, allowing const folding of the user of the output value, while only forbiding the removal of the creator of the input value.

The intention (though I agree the name doesn't quite communicate this) was that the compiler should assume that assume_used uses the provided input value in any legal fashion. That could include taking references to it, mutating it, or anything else it could think of. So constant folding across it should not be okay. I guess it depends on what you take "use" to mean. To me, using a value includes potentially modifying it. Can we think of another word than "use" to indicate that the function does arbitrary (legal) things to the passed-in value?

I like the assume_ part, but agree with @bjorn3 in that assume_used seems only fitting for the "eat the result of my calculation" use case, but not for the "hide this constant from the compiler" use case.

I think my take here is, as above, that the compiler can't constant-fold across assume_used, because it should assume it does not know how the value is used (e.g., the use might include modifications). I agree a different word than "use" may be appropriate, though none immediately come to mind. assume_arbitrary_impl maybe, though that reads much more awkwardly?

Wikipedia describes black boxes as having an opaque implementation, so one option might be simply opaque()

That would work if the body was actually opaque to the compiler, but I don't think that's actually the case. The compiler does still get some insight into what goes on, as per rust-lang/rfcs#2360 (comment), and so calling it opaque may give the reader expectations that are not true.

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 28, 2020

@tmiasko I'll respond to your comment over here.

I wonder how would it go along with other assume-like functions in
the standard library. There is assume intrinsic and a family of assume_init
functions, for those it is responsibility of the caller to guarantee the thing
being assumed. In the proposed assume_used, the role of the callee.

Actually, I think the existing assume_ methods set a good precedent for the use of assume. There, like here, we are telling the compiler to assume that something is true from this point forward. For me, the responsibility on the caller comes from those methods being unsafe, not from the assume in the name.

@petertodd
Copy link
Contributor

@jonhoo Good point. Another suggestion: pretend_used

As in, we're telling the compiler to pretend that value is used to calculate a new result, even though it isn't.

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 29, 2020

Oooh, I quite like pretend_used. We don't have precedence for the use of "pretend" in std (unlike assume), but I don't mind it. If you (and @bjorn3) both prefer pretend_used to assume_used, I'm happy to update the PR.

The current name is a legacy from before RFC 2360. The RFC calls the
method `bench_black_box`, though acknowledges that this name is not be
ideal either.

The main objection to the name during and since the RFC is that it is
not truly a black box to the compiler. Instead, the hint only encourages
the compiler to consider the passed-in value used. This in the hope that
the compiler will materialize that value somewhere, such as in memory or
in a register, and not eliminate the input as dead code.

This PR proposes to rename the method to `pretend_used`. This clearly
indicates the precise semantics the hint conveys to the compiler,
without suggesting that it disables further compiler optimizations. The
name also reads straightforwardly in code: `hint::pretend_used(x)` hints
to the compiler that it should pretend that `x` is used in some
arbitrary way.

`pretend_used` also rectifies the secondary naming concern the RFC
raised with the names `black_box` and `bench_black_box`; the potential
confusion with "boxed" values.

If this change lands, it completes the naming portion of rust-lang#64102.
@jonhoo jonhoo changed the title New name for hint::black_box: assume_used New name for hint::black_box: pretend_used Jul 29, 2020
@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 29, 2020

PR updated 👍

@petertodd
Copy link
Contributor

Oooh, I quite like pretend_used. We don't have precedence for the use of "pretend" in std (unlike assume), but I don't mind it. If you (and @bjorn3) both prefer pretend_used to assume_used, I'm happy to update the PR.

Re: lack of precedence, I actually think that's an advantage: "assume" is used in production code to enable optimizations by assuming something that is true, is true; "pretend" will be used in testing code to disable optimizations by pretending that something which is not actually true, is true.

Suggested doc text:

"An identity function that asks the compiler to pretend that a value is used in an arbitrary computation, usually to disable optimizations in benchmarking code."

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 29, 2020

@LukasKalbertodt and @CryZe: your emoji reactions suggest that you did not like assume_used. How do you feel about the newly proposed pretend_used? If you do not like that either, it would be helpful if you could describe what makes you feel negatively about it so we can see if we can come up with a better name :)

@CryZe
Copy link
Contributor

CryZe commented Jul 29, 2020

pretend is a lot better but used still seems really ambiguous. Functions like assume_init imply that an assumption about something that is already true outside of the call holds. So in that specific case you need to already have initialized it and the function only tells the compiler about it. This is not the case with assume / pretend used, as in this case it's really something that the function is actively doing, that isn't already the case without the function call. So while pretend is a bit better, the whole function name still seems somewhat off too me, including the doc comment that does not clarify this all too much either.

What about any (combination) of these, which all seem a lot less ambiguous (though maybe some may imply too much?):

keep_around
dont_optimize_away
has_relevant_side_effects
keep_this

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 29, 2020

Yeah, the problem we're grappling with is that this method really promises (or rather, hints) quite little. And exactly what the compiler is likely to do as a result is very compiler-dependent. In reality, all the method does is tell the compiler "consider this value as used in every conceivable, but legal way". No more, no less. What that means is up to the compiler. That's why I decided to propose the _used names — they get at what the hint actually is, as opposed to what the compiler might do with that hint.

To use your proposals as examples:

  • keep_around: this is a little too weak — the hint should tell the compiler to not apply constant folding across the call (since the call may change the value), but this name does not imply that.
  • dont_optimize_away: a similar issue to the above — it implies that all the hint does is tell the compiler not to remove the value, but in practice the hint is (very slightly) stronger than that.
  • has_relevant_side_effects: this doesn't seem quite right to me — this might confuse a reader into thinking that the function itself has side effect, when it does not. or alternatively this might read as "assume that the given expression has side-effects, rather than the hint function having side-effects, which would imply that the expression cannot be optimized. but that is not the case.
  • keep_this: this also implies (to me) that the expression passed to keep_this may not be optimized, whereas that is not the case. it also shares the same issues as above around constant folding.

This is not the case with assume / pretend used, as in this case it's really something that the function is actively doing, that isn't already the case without the function call.

I'm not quite sure I follow this — assume_init, at least as I read it, is telling the compiler to assume that the given value is now initialized. It's the unsafe part that is the caller agreeing to a contract. In the same way that assume/pretend here is a directive to the compiler that something is true. In this case, the thing we're telling the compiler is true is that the passed-in value is being used (even though it doesn't look like it is).

@bjorn3
Copy link
Member

bjorn3 commented Jul 29, 2020

assume is normally used to allow the compiler to perform extra optimizations by assuming that something is true. pretend here is used to forbid the compiler from performing certain optimizations by pretending that something is not true. They are eachothers counterpart.

use std::mem::MaybeUninit;

#[inline(never)]
#[no_mangle]
fn random() -> [isize; 32] {
let r = unsafe { MaybeUninit::uninit().assume_init() };
// Avoid optimizing everything out.
black_box(r)
pretend_used(r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretend_use / blackbox / ... is the completely wrong function to call here right? It doesn't help with correctness and this whole code relies on the black box hiding the fact that it's UB to read uninitialized memory from the compiler. (Same in the address sanitizer test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think what's going on here is actually that we want the code to be incorrect, but without pretend_used (as the comment implies) the r would be entirely optimized out by the compiler. pretend_used here causes the compiler to leave r in, which in turn means that the undefined behavior can be detected by the sanitizer.

@CryZe
Copy link
Contributor

CryZe commented Jul 29, 2020

Okay, I believe at this point I'm even more confused than before. My understanding of this function is that it's an identity function, but this is intentionally hidden from the compiler (though not guaranteed) to prevent the compiler from knowing it's an identity function, so the compiler can't assume anything about the value anymore, meaning that no constant folding can happen and any other assumptions the compiler made about the value are completely lost.

pretend_used / assume_used seem to completely distract from this, blackbox does a much better job at conveying that. Or what exactly am I missing?

If it is (and it apparently is?) more subtle than that, then the current documentation (and name) do an incredibly bad job of conveying what kind of optimizations exactly are and are not posssible, i.e. using a value, what does that even mean to the user of this API, as someone who doesn't even know about internals of compiler optimizations.

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 29, 2020

It definitely is a bit confusing.

Part of the problem is that the actual semantics of the method really are "pretend that the program may have any legal Rust impl for this function". That does include things like constant folding (since it might return a different value than was provided) and dead code elimination (since the method could have side-effects), but it is not truly a black box in the way black box is normally defined in other contexts.

@hanna-kruppe had a good summary over in #64102 (comment), and there's also @cramertj's original comment over in rust-lang/rfcs#2360 (comment). The concern is specifically, assuming I've understood them correctly, that if it is called black_box, people may start to rely on it actually having black-box semantics for correctness purposes. But since this is only a hint to the compiler, that is is under no obligation to follow, those assumptions that unsafe code may rely on aren't guaranteed. A name that tries to not claim to actually be a black box, but instead at what the hint is trying to tell the compiler, means that the function is less likely to be used for correctness.

I completely agree with you that the current documentation is inaccurate. I tried to make it slightly better in this PR, but it's difficult because the "meaning" of this function to the end user is so ill-defined. I definitely do not think the updated docs in this PR are sufficient, they were only meant to be a starting point for making progress on #64102.

@LukasKalbertodt
Copy link
Member

@jonhoo The prefix pretend_ is great, as many already mentioned. But just like @CryZe, I'm still not convinced by _used, unfortunately. While I know that "use" is a pretty general word, I personally intuitively associate "used" with "read-only". So pretend_used(27) doesn't feel to me like we are pretending that the value might change. (Maybe that's just because I'm not a native speaker; if so please tell me!) So the assume -> pretend change didn't change anything about what I said in my previous comment:

assume_used seems only fitting for the "eat the result of my calculation" use case, but not for the "hide this constant from the compiler" use case.

Just as counter-example, pretend_modified (or pretend_altered) would be on the other side: good for the "hide a constant" case, not so much for the "sink for computation result" case.

I was thinking about a better word since yesterday, but didn't come up with anything useful. I guess pretend_used_and_modified would capture what I'm after, but that's obviously long and meh. Some other ideas to throw out there (though, none of them ideal IMO):

  • pretend_opaque
  • remove_assumptions, remove_knowledge
  • assume_nothing

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 30, 2020

Hmm, yeah, I see what you mean. For pretend_used to make sense, the reader does have to read "used" as "any impl", not just "reads". That said, I feel like the documentation could clarify this? The name of the function doesn't need to be perfect, it just has to read mostly intuitively, and crucially shouldn't be thought to give more guarantees that it does (which is the problem with black_box. Giving additional details about the kinds of use that "can be pretended" in the documentation seem (to be) like a fair compromise. But I may also be alone in feeling that way :)

Another take would be to go with something like std::hint::arbitrary_impl?

@Mark-Simulacrum
Copy link
Member

Arbitrary impl could also be a no-op. In some sense, maybe "assume nothing" is what we want to get at -- i.e., the compiler should not presume that it knows anything about the function call being made and what it's effects are, beyond the normal "Rust guarantees" (i.e., mutable aliasing etc).

But assume_nothing looks quite odd...

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 30, 2020

@Mark-Simulacrum Another challenge with assume_nothing is that it implies stronger guarantees that the function actually gives. The problem is pretty similar to the one for black_box — people might use the hint for correctness purposes, thinking that they are telling the compiler to assume nothing about the passed-in value.

There's also a challenge here around the "object" of the verb that the function represents. assume_nothing(3 + x) (and some other names that have come up) imply (to me) that the compiler should assume nothing about 3 + x, rather than about what the assume_nothing function itself does with the value it is passed. The former reading might make the reader think that 3 + x won't be computed at compile time for example, when this function does not actually guarantee that.

@CryZe
Copy link
Contributor

CryZe commented Jul 30, 2020

Maybe we want may_ / try_ / etc. as a prefix to indicate that it's not actually guaranteed.

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 30, 2020

Yeah, that might be a good idea. I prefer try_, but it's so strongly associated with Result return types that I feel like it'd probably be distracting. may_ as a prefix makes it seem more like part of the directive to the compiler of "is allowed to", as opposed to "try to pretend". Maybe suggest_?

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 30, 2020

There's also the argument that because this is in the hint module, it is never more than a hint, and should never be relied on for correctness? But that may be too subtle.

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 30, 2020

Here's a super weird suggestion: std::hint::arbitrary_safe_code.

@petertodd
Copy link
Contributor

petertodd commented Jul 30, 2020 via email

@jonhoo
Copy link
Contributor Author

jonhoo commented Jul 31, 2020

True, though mock_fn also doesn't carry any semantics about what the hint actually is, which I think is vital.

@jonhoo
Copy link
Contributor Author

jonhoo commented Aug 3, 2020

Another way might be pretend_arbitrarily_used?

@petertodd
Copy link
Contributor

petertodd commented Aug 3, 2020 via email

@crlf0710
Copy link
Member

crlf0710 commented Aug 5, 2020

I'm thinking about something like:

fn pretend_modify<T>(v: &mut T);

or even just

fn touch<T>(v: &mut T);

but this is a little too jargon-y, i think...

@jonhoo
Copy link
Contributor Author

jonhoo commented Aug 5, 2020

fn pretend_modify<T>(v: &mut T);

This probably isn't what we want, since pretend_modify is then limited to doing things that are legal with a &mut T, rather than with a T as currently written.

As for the names, pretend_modify is a little weird to me, since it seems to only pretend that the value is modified. Whereas in reality we want to communicate that the method can use T in any way that is legal in safe Rust. touch I agree is probably too jargon-y, and even if it weren't, it's a little odd since touch doesn't really imply "may read this memory" for example.

@tesuji
Copy link
Contributor

tesuji commented Aug 5, 2020

How about noopt, keep_unused ?

@jonhoo
Copy link
Contributor Author

jonhoo commented Aug 5, 2020

I'm not sure what noopt means? If it's short for no_optimization or dont_optimize, those don't work because the compiler still performs optimizations, we just suggest to it that it doesn't perform certain optimizations.

Could you say more about what you think keep_unused conveys?

@LukasKalbertodt
Copy link
Member

Just throwing a few more things out there:

  • conceal,
  • pretend_concealed
  • conceal_value
  • conceal_from_compiler
  • Or any of the above with veil instead of conceal

I think I actually somewhat like conceal and conceal_value. Seem fitting for both situations:

// Hide constant
number_of_primes_below(hint::conceal_value(1000));

// Hide result of computation
vec.sort();
hint::conceal_value(vec);

@petertodd
Copy link
Contributor

petertodd commented Aug 6, 2020 via email

@jonhoo
Copy link
Contributor Author

jonhoo commented Aug 8, 2020

Here's a different proposal: consume_and_produce.

The idea is that it conveys to the compiler that it will consume the value that is passed to it, in some unspecified way. The produce part is there to make it clear to readers that it will also produce the same value.

@Dylan-DPC-zz
Copy link

Given there's no consensus here on the naming, I'm going to close this as this stage. I think we could discuss this in an issue or a MCP (once t-libs officially starts doing MCPs) and then rework this. Thanks :)

@Dylan-DPC-zz Dylan-DPC-zz added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.